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Abstract: In this paper we present an approach to control a vehicle in a hostile environment 
with static obstacles and moving adversaries. The vehicle is required to satisfy a mission 
objective expressed as a temporal logic specification over a set of properties satisfied at regions 
of a partitioned environment. We model the movements of adversaries in between regions of the 
environment as Poisson processes. Furthermore, we assume that the time it takes for the vehicle 
to traverse in between two facets of each region is exponentially distributed, and we obtain the 
rate of this exponential distribution from a simulator of the environment. We capture the motion 
of the vehicle and the vehicle updates of adversaries distributions as a Markov Decision Process. 
Using tools in Probabilistic Computational Tree Logic, we find a control strategy for the vehicle 
that maximizes the probability of accomplishing the mission objective. We demonstrate our 
approach with illustrative case studies. 



1. INTRODUCTION 

Robot motion planning and control has been widely stud- 
ied in the last twenty years. Recently, temporal logics, 
such as Linear Temporal Logic (LTL) and Computational 
Tree Logic (CTL) have become increasingly popular for 
specifying robotic tasks (see, for example, [Conner et al., 
2007, Karaman and Frazzoli, 2008, Kloetzer and Belta, 
2008b, Loizou and Kyriakopoulos, 2004]). It has been 
shown that temporal logics can serve as rich languages 
capable of specifying complex mission tasks such as "go 
to region A and avoid region B unless regions C or D are 
visited" . 

Many of the above-mentioned works that use a temporal 
logic as a specification language rely on the assumption 
that the motion of the robot in the environment can be 
abstracted to a finite transition system by partitioning 
the environment. The transition system must be finite in 
order to allow the use of existing model-checking tools for 
temporal logics (see [Baier et al., 2008]). Furthermore, it 
is assumed that the resultant transition system obtained 
from the abstraction process is deterministic (i.e., an 
available control action deterministically triggers a unique 
transition from one region of the environment to anther 
region), and the environment is static. To address environ- 
ments with dynamic obstacles, [Kress-Gazit et al., 2007, 
Topcu et al., 2009] find control strategies that guarantee 
satisfactions of specifications by playing temporal logic 
games with the environment. 
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NSF CNS-0834260, and the United Technologies Research Center. 



In practice, due to noise introduced in control (actuator 
error) or the environment (measurement error), a deter- 
ministic transition system may not adequately represent 
the motion of the robot. [Kloetzer and Belta, 2008a] pro- 
posed a control strategy for a purely non-deterministic 
transition system (i.e., a control action enables multiple 
possible transitions to several regions of the environment). 
[Lahijanian et al., 2010] pushed this approach a step fur- 
ther by modeling the motion of the robot as a Markov 
Decision Process (MDP) (i.e., a control action triggers a 
transition from one region to anther with some fixed and 
known probability). The transition probabilities of this 
MDP can be obtained from empirical measurements or an 
accurate simulator of the environment. A control strategy 
was then derived to satisfy a mission task specified in 
Probabilistic Computational Tree Logic (PCTL) with the 
maximum probability. 

In this paper, we extend this approach to control a vehicle 
in a dynamic and threat-rich environment with static 
obstacles and moving adversaries. We assume that the 
environment is partitioned into polygonal regions, and a 
high level mission objective is given over some properties 
assigned to these regions. We model the movements of 
adversaries in a region as Poisson processes. Furthermore, 
we model the time it takes for the vehicle to reach from one 
facet of a region to another facet as an exponential random 
variable. This motion model is supported by our realistic 
simulator of the environment, and we obtain the rate of 
this exponential random variable from the simulator. 

The main contribution of this paper is an approach to de- 
sign a reactive control strategy that provides probabilistic 



guarantees of accomplishing the mission in a threat-rich 
environment. This control strategy is reactive in the sense 
that the control of the vehicle is updated whenever the 
vehicle reaches a new region in the environment, or an 
adversary moves in between the current region and its 
adjacent region (i.e. if the vehicle observes movements 
of adversaries, it updates the adversary distributions for 
adjacent regions and chooses a different control action as 
needed). In order to solve this problem, we capture the 
motion of the vehicle, as well as vehicle estimates of the 
adversary distributions in a MDP. This way, we map the 
vehicle control problem to the problem of finding a control 
policy for an MDP such that the probability of satisfying 
a PCTL formula is maximized. For latter, we use our 
previous approach presented in [Lahijanian et al., 2010]. 

The method that we propose here is closely related 
to "classical" Dynamic Programming (DP) - based ap- 
proaches [Alterovitz et al., 2007]. In particular, it can be 
seen as a simple extension of a Maximum Reachability 
Probability (MRP) problem, which itself is a simple case 
of a stochastic shortest path (SSP) problem [Bertsekas, 
1995]. In these problems, the set of allowed specifications 
is restricted to reaching a given destination state, and the 
corresponding optimal control strategy is found by solving 
one linear program (LP). In contrast, our proposed PCTL 
control framework allows for richer, temporal logic specifi- 
cations and multiple destinations. In addition, through the 
use of nested probabilities, it allows for specifying sub-task 
probabilities. 

The rest of the paper is organized as follows. Sec. 2 in- 
troduces necessary notations, definitions and preliminary 
results. Sec. 3 formulates the problem and describes our 
approach. Sec. 4 describes the construction process of the 
MDP modelling the vehicle in the environment. Sec. 5 ex- 
plains how we generate the desired vehicle control strategy. 
A simulator of the vehicle environment is detailed in Sec. 6 
and some numerical case studies are shown in Sec. 7. Sec. 
8 concludes the paper. 

2. PRELIMINARIES 

2.1 Markov Decision Process and Probability Measure 

Definition 2.1. (Markov Decision Process (MDP)). A la- 
beled MDP M is a tuple (5, s , Act, A, P, II, h) where 

• S is a finite set of states; 

• so G S is the initial state; 

• Act is a set of actions; 

• A : S — >> 2 Act is a function specifying the enabled 
actions at a state s; 

• P : S x Act x S —> [0, 1] is a transition probability 
function such that for all states s G S and actions 
a G A(s): ^ s > e s P( s i a i s ') = 1> anc ^ ^ or a ^ actions 
a g A(s) and s f G S, P(s, a, s') = 0; 

• II is the set of properties; 

• h : S — >> 2 n is a function that assigns some properties 
in II to each state of s G S. 

A path uj through an MDP is a sequence of states uj = 
sosi . . . SiSi+i . . . where each transition is induced by a 
choice of action at the current step i. We denote the set of 
all finite paths by Path^ m and of infinite paths by Path. 




Fig. 1. Example of a four-state MDP. A(s) and h(s) are 
shown for each state. The labels over the transitions 
correspond to the transition probabilities. Assume the 
simple control policy Q defined by mapping: G(so) — 
Q>u G("'Si) = a 2 , G('"S 2 ) = ai and (/(•••s 3 ) = 
a\ where • • • Si denotes any finite path terminating 
in S{. It is easy to see that the probability of a 
finite path so^i-si is ^g Zn (soSiSi) = 0.1. Under G, 
the cylinder set of all infinite paths with this prefix 

is C(S S 1 S 1 ) = {sPISj, SqSiSiS^, 5 Si5i5^", s sl, ...}. 

According to Eq. (1), we have that Pig(C(soSiSi)) = 
Pv f g in (s oSlSl ) = 0.1. 

Definition 2.2. (Control Policy). A control policy G of 
an MDP model Ai is a function mapping a finite path 
u fzn _ SqSiS2 . _ Sn f M onto an action in A(s n ). In 

other words, a policy is a function Q : Path /in -> Act 
that specifies for every finite path, the next action to be 
applied. 

Under policy G-, an MDP becomes an infinite discrete-time 
Markov Chain, denoted by A4g. Let path Pathg C Path 
and Path^ m C Path*^ m denote the set of infinite and finite 
paths that can be produced under G- Because there is a 
one-to-one mapping between Pathg and the set of paths 
of A4g the Markov Chain induces a probability measure 
over Pathg as follows. First, define a measure Pr^ m 
over the set of finite paths by setting the probability of 
Qjfm £ p a th^ m equal to the product of the corresponding 
transition probabilities in Aig. Then, define C(uj^ in ) as 
the set of all (infinite) paths uj G Pathg with the prefix 
ujf in . The probability measure on the smallest <r-algebra 
over Pathg containing C(uj fin ) for all uo fin C Path£ in is 
the unique measure satisfying 

Pr^(C(^))=Pr^(^-), (1) 

for all uj fin G Path£ in . A simple MDP is shown in Fig. 1 
to illustrate the above concepts. We refer readers to [Baier 
et al., 2008, Ross, 2006] for more information about MDPs 
and probability measures defined on paths of an MDP. 

2.2 Probabilistic Computational Tree Logic 

Probabilistic Computational Tree Logic (PCTL) (Rutten 
et al. [2004]) is a probabilistic extension of CTL that 
includes the probabilistic operator V. PCTL formulas 
are interpreted either as truth values (true or false) or 
qualitative expressions (i.e. find the maximum probability) 
of properties of the MDP. Formulas are constructed by 
connecting properties from a set of properties II using 
Boolean operators (-> (negation), A (conjunction), and — » 
(implication)), temporal operators (Q (next), U (until)), 



and the probabilistic operator V. This allows to express 
rich specifications given in natural language as PCTL 
formulas. 

For example, consider the MDP shown in Fig. 1 and 
specification <\> = V ma x=? ["^3 U • In words, this formula 
asks for the maximum probability of reaching the state 
satisfying (i.e., S3) without passing through the state 
satisfying (i.e., S2). This problem can be translated to 
a problem of finding the maximum probability of reaching 
a set of states of the MDP, using the probability measure 
of paths under a policy defined in the previous sub-section 
(for more details, see Baier et al. [2008], Lahijanian et al. 
[2010]). There are probabilistic model-checking tools, such 
as PRISM (see [Kwiatkowska et al., 2004]), that solve this 
problem. More complex specifications can be obtained by 
nesting the probability operator and temporal operators, 
e.g., the formula V max =? [^r 3 U (r 4 AP> .5h r 3 W ri])], asks 
for the maximum probability of eventually visit state 53 
and then with probability greater than 0.5 state so while 
always avoiding state 52. 

3. PROBLEM FORMULATION AND APPROACH 

We consider a city environment that is partitioned into 
a set of polytopic regions R. We assume the partition 1 is 
such that adjacent regions in the environment share ex- 
actly one facet. We denote F as the set of facets of all 
polytopes in R. We assume that one region r p G R is 
labeled as the "pick-up" region, and another region G R 
is labeled as the "drop-off" region. Fig. 2(a) shows an 
example of a partitioned city environment. We assume that 
there is a vehicle moving in the environment. We require 
this vehicle to carry out the following mission objective: 

Mission Objective: Starting from an initial facet fi n u G 
F in a region r init G R, the vehicle is required to reach 
the pick-up region r p to pick up a load. Then, the vehicle 
is required to reach the drop-off region to drop-off the 
load. 

We consider a threat-rich environment with dynamic ad- 
versaries and static obstacles in some regions. The proba- 
bility of safely crossing a region depends on the number of 
adversaries and the obstacles in that region. We say that 
the vehicle is lost in a region if it fails to safely cross the 
region (and thus fails the mission objective). We assume 
that there is no adversary or obstacle in the initial region. 

Let integers M r and N r be the minimum and maximum 
number of adversaries in region r G R, respectively. We 
define 

P r £ :{M r ,...,7V r }^[0,l] (2) 

as a given (initial) probability mass function for adver- 
saries in region r G R, i.e. p* m£ (n) is the probability of 

having n adversaries in region r and Yln=M PT^i 71 ) ~ 1- 
However, adversaries may move in between regions. We 
model the movements of adversary in a region by arrivals 
of customers in a queue. Thus we consider the movements 
of adversary as Poisson processes and we assume that the 
time it takes for an adversary to leave and enter region 
r is exponentially distributed with rate fii(r) and /x e (r), 

1 Throughout the paper, we relax the notion of a partition by 
allowing regions to share facets 



respectively. We further assume that adversaries move in- 
dependent of each other, and at region r, the distributions 
of adversaries in adjacent regions of r depend only on the 
adversaries in r and the movements of adversaries between 
r and its adjacent regions. 

In addition, each region has an attribute that characterizes 
the presence of obstacles, which we call obstacle density. 
We define 

P °:{0,1,...,JV?}->[0,1], (3) 
as the probability mass function of the obstacle density 
in region r G R, i.e., p°(o) is the probability of having 

obstacle density o in region r and X^o(°) = 1- Unlike 
adversaries, we assume that obstacles can not move in 
between regions. 

We assume that the vehicle has a map of the environment 
and can detect its current region. When the vehicle enters 
a region, it observes the number of adversaries and the 
obstacle density in this region. When the vehicle is travers- 
ing inside a region, it detects movements of adversaries 
between the current region and its adjacent regions. 

The motion capability of the vehicle in the environment is 
limited by a (not necessarily symmetric) relation ACFx 
F, with the following meaning: If the vehicle is at a facet 
/ G F and (/,/') G A, then it can use a motion primitive 
to move from / towards f (without passing through any 
other facet), i.e., A represents a set of motion primitives 
for the vehicle. The control of the vehicle is represented 
by (/,/') G A, with the meaning that at facet /, f is 
the next facet the vehicle should move towards. Fig. 2(b) 
shows possible motions of the vehicle in this environment. 
We assume that the time it takes for the vehicle to move 
from facet / to facet f is exponentially distributed with 
rate A(<5), where S = (/, f) G A. This assumption is based 
on results from a simulator of the environment (see Sec. 
6). 

During the time when the vehicle is executing a mission 
primitive (/,/') (i.e., moving between facet / and /'), we 
denote the probability of losing the vehicle as: 

p\° st :{M r ,..., N r } x {0, ... , 7V r °} -> [0, 1], (4) 

where S = (/, /') G A, and r is the region bounded by / 
and /'. We obtain p l s ost (n, o) and A (5) from the simulator of 
the environment given initial distributions of adversaries 
and obstacle density in each region (see Sec. 6 for more 
details). 

In this paper we aim to find a reactive control strategy for 
the vehicle. A vehicle control strategy at a region r depends 
on the facet / through which the vehicle entered r. It 
returns the facet f the vehicle should move towards, such 
that (/,/') G A. The control strategy is reactive in the 
sense that it also depends on the number of adversaries and 
the obstacle density observed when entering the current 
region, as well as the movements of adversaries in the 
current region. We are now ready to formulate the main 
problem we consider in this paper: 

Problem: Consider the partitioned environment defined 
by R and F; initial facet and region f init and Ti n n\ 
the motion capability A of a vehicle; initial adversary 
and obstacle density distributions for each region p z 7 nzt 
and p° r \ the probability of losing the vehicle p l s ost ; rate 



(a) A realistic scenario representing a city environment 

partitioned into regions. r p denotes the "pick-up" region, 
rd denotes the "drop-off" region. 



(b) Possible motion of the vehicle in the environment, 
and The arrows represent movements of the vehicle in 
between facets, e.g., the vehicle can choose to go 
from f2 towards f§. For this scenario, we assume 
that, only at the pick-up and drop-off regions, the 
vehicle can enter and leave through the same facet 



Fig. 2. Example of a partitioned city environment 

of adversaries fii(r) and /i e (0; an d r &te of the vehicle 
X(S); Find the vehicle control strategy that maximizes the 
probability of satisfying the Mission Objective. 

The key idea of our approach is to model the motion of 
the vehicle in the environment, as well as vehicle estimates 
of adversary distributions in the environment as an MDP. 
By capturing estimates of adversary distributions in this 
MDP, the vehicle updates the adversary distributions of its 
adjacent regions as it detects the movements of adversaries 
in the current region, and the control strategy produces an 
updated control if necessary. As a result, a policy for the 
MDP is equivalent to a reactive control strategy for the 
vehicle in the environment. We then translate the mission 
objective to a PCTL formula and find the optimal policy 
satisfying this formula with the maximum probability. 

Remark 1. In this paper, we assume a "deterministic" 
vehicle control model. In other words, we assume that 
the vehicle can use a motion primitive (/,/') to move 
from facet / to facet /' of each region. We can easily 
extend the result of this paper to the case when the 
vehicle has a "probabilistic" control model, in which the 
application of a motion primitive at a facet of a region 
enables transitions with known probabilities to several 
facets of the same region. This can easily achieved by 
modifying the transition probability function of the MDP. 



4. CONSTRUCTION OF AN MDP MODEL 

In this section we explain the construction of an MDP 
model for the motion of the vehicle and vehicle estimates 
of adversary distributions in the environment. We first 
explain in Sec. 4.1 how the vehicle updates its estimate 
of adversary distributions for its adjacent regions when an 
adversary enters or leaves the current region. The updates 
of adversary distributions are captured in the MDP model. 
Then we define the MDP in Sec. 4.2. In Sec. 4.3 we describe 



in detail how we obtain the transition probability function 
for the MDP model. 

4.1 Update of the adversary distributions 

As adversaries enter and leave the current region, it is 
necessary to update the distributions of adversaries in 
adjacent regions. Because the vehicle can only observe 
the movements of adversaries in its current region, and 
due to the assumption that distributions of adversaries in 
adjacent regions depend only on the current region and 
its adjacent regions, it is only necessary to update the 
adversary distributions for adjacent regions, and not for 
all regions in the environment. Our MDP model captures 
all possible adversary distributions of adjacent regions at 
each region. 

Let us denote the distribution for region r as p r . The initial 
adversary distribution of region r is given in Eq. 2. Thus, 
the adversary distribution of region r is a probability mass 
function p r : {M, . . . , N} — >> [0, 1], where M r < M < N < 
N r . Note that if M = N = N r , then p r (N r ) = 1 and 
no adversary may enter region r, or else the assumption 
that N r is the maximum number of adversaries in region 
r would be violated. Similarly, if M = N = M r , then no 
adversary may leave region r. 

Given the current adversary distribution p r , assuming that 
an adversary has entered region r (which means that 
p r (N r ) 7^ 1), then we define the updated distribution as 
p+ in the following way: 

+ . f{M + l,. ..,#}-► [0,1] HN = N r 
Pr • \ {M + 1, . . . , N + 1} -> [0, 1] if TV < N r , w 

such that: 

+ / x { p r (n-l)+ Pr ^ if N = N r 
pj(n) = N — M (6) 

IjV(n-l) if iV<iV r . 



Pr 

{2,3,4,5,6} 
{.2, .1,. 3,0, .4} 




{3,4,5,6} 
{.3, .2, .4,.!} 



Pr 

{2,3,4,5} 
{.15, .35, .05, .45} 




Pr 

{4,5,6} 
{.33, .23, .43} 



Pj 
{2,3,4,5} 
{.3, .2, .4,.!} 



Fig. 3. An example of using a tree to obtain all possible 
distributions for adversaries in a region. Starting with 
p* m£ , we set p r = p r nit and obtain all distributions 
in D r using Eq. (5)- (8). In this example, we have 
p r : {2,3,4,5,6} [0,1], where p r (2) = .2, p r (S) = 
.1, p r (4) = .3, p r (5) = and p r (6) = .4. Each 
box denotes the updated distribution from a previous 
distribution. An arrow with +1 (or —1) means that 
we assume an adversary entered (or left) region r. 

Note that the probability distribution simply shifts by 1 
if TV < N r . If TV = TV r , given that an adversary entered 
region r, we can conclude that the previous number of 
adversaries cannot be TV r , thus we evenly redistribute the 
probability associated with TV r before an adversary entered 
the region. 

Similarly, assuming that an adversary has left region r, 
then p r (M r ) ^ 1 and we define the updated distribution 
as p~ in the following way: 

_ / {M, . . . , TV - 1} -> [0, 1] if M = M r 
Pr 1 \ {M - 1, . . . , TV - 1} -> [0, 1] if M > M r , 

such that: 

p;(n) = \^ n ^ 1 ^N^M 

[p r (n + l) if M > M r 



if M = M r 



(7) 



(8) 



Given p ri it is easy to verify that p+ : {M, . . . , TV} — » [0, 1] 

is a valid probability mass functions, i.e. ^2 n=M pt ( n ) = 1 
(similarly for p~). Starting with the initial distribution 
p r n%i , we can use Eq. (5)- (8) to determine all possible 
adversary distributions for region r. We denote the set 
of all possible distributions for region r as D r . We can use 
a tree to obtain D r with an example showing in Fig. 3. 

4-2 MDP construction 

To begin the construction of the MDP model, we denote 
B C F x R as the boundary relation where (/, r) G B if 
and only if / is a facet of region r. We denote the set of 
regions adjacent to region r as A r = {r*i, ...,r m }ciJ. 

Given i?, F, A, D r , p° r , p l 6 ost , /i/(r), /i e (r) and A(£), we 
define a labeled MDP M as a tuple (5, s , ^ct, A, P, II, ft) 
(see Def. 2.1), where: 

• S = Ur G fl{{(/>*) G B l* = r i X {M r ,...,TV r } X 

{0, 1, . . . ,TV^}x {lost, alive} xf] r , GA ^ D r /}. The mean- 
ing of the state is as follows: ((/, r), n, o, alive, p ri , . . . , 
p rrn ) means that the vehicle is at facet /, head- 
ing towards region r, and in region r there are 
n adversaries, o obstacles, the vehicle is currently 



not lost, and the adversary distribution for the 
adjacent region T{ G A r — {^l? • • • ? ^m} is Pr^* 
((/, r), n, o, lost,p ri , . . . ,Pr m ) means that the vehicle 
did not make it to facet / because it was lost in the 
previous region while heading towards /; 

• so = ((/imt,nmt),0,0,alive,p^ i£ ,...,p^ i£ ) is the 

1 k 

initial state, where A r . nit = {r[, . . . , r' k }; 

• Act = A Ur is the set of actions, where r is a dummy 
action when the vehicle is lost; 

• A is defined as follows: If the vehicle is alive, then 
M s ) = {(/> /') € A}, otherwise A(s) = r; 

• We describe how we generate the transition probabil- 
ity function P in Sec. 4.3; 

• II = {r p , 7*^, alive} is the set of properties; 

• ft is defined as follows: If s = ((/, r), n, o, 6,pi, . . . ,p m ), 
then {alive} G ft(s) if and only if b = alive, {r p } G 
ft(s) if and only if r = r p , and {r^} G ft(s) if and only 
if r = r d . 

As the vehicle moves in the environment, it updates its 
corresponding state on Ai. The vehicle updates its state 
when: 

• it reaches a facet / and enters a region r, and 
observes the number of adversary n and obstacle 
density o in region r, then it updates its state to 
((/> r), n, o, alive, . . . ,pj™ £ ); 

• an adversary leaves the current region r and moves 
into region r 7 , given the current adversary distri- 
bution of region r' as p r >, the vehicle updates this 
distribution to p^, ; 

• an adversary enters the current region r from region 
r 7 , given the current adversary distribution of region 
r r as p r ' , the vehicle updates this distribution to p~, 

Since actions of M consists of A, is designed so that 
its control policy can be directly translated to a reactive 
control strategy for the vehicle. When the vehicle updates 
its state in M, then the action 5 G A at its current state 
determines the next facet the vehicle should move towards. 



4-3 Generating the transition probability function P 

In this subsection we describe in detail how we generate 
the transition probability function P for the MDP model. 
First, we define a random variable e for the time in between 
a vehicle entering the current region r at facet /, heading 
towards facet f and an event occurring, which can be: 1) 
an adversary entering the current region; 2) an adversary 
leaving the current region; or 3) the vehicle reaching facet 

Note that if X\ , . . . , X n are independent exponentially dis- 
tributed random variables with rate parameters Ai , . . . , A n , 
then min{Xi, . . . , X n } is exponentially distributed with 
parameter A = YH=i ^i- The probability that is the 
minimum is Pr(Xk = min{X\, . . . , X n }) = By as- 
sumption, movements of adversaries are independent of 
each other. Since the arrival and departure of adversaries 
in the current region are modeled as two Poisson processes 
with inter-arrival and inter-departure time exponentially 
distributed with rate /i e ( r ) an d Mz(r), respectively, and 
the time required for the vehicle to reach facet f is 
exponentially distributed with rate A ((5), where S = (/, /'), 



S=(f2j 8 ) 



* = ((/2,r 4 ),2,3, alive, p ri ,Pr 3 ,Pr,) 




Fig. 4. A fragment of the MDP M. corresponding to the mission scenario shown in Fig. 2(b). As an example, assume the 
following: A((/ 2 ,/ 8 )) = .5, ^ e (r 4 ) = W M = .3, p ri (x) = p^x) = 1/3, x G {1,2,3}, p r3 (a;) = = 1/3, a; € 



{2,3,4}, p r7 (2) = p£«(2) = .6, p r7 (3) = p~"(3) = .2, p r7 (4) = p£"(4) 
correspond to the transition probabilities obtained using Eq. (10)-(12) (e.g., the probability that an adversary 
leaves region is .68 and the probability that it enters region r\ is .26, thus P(s, 5, s') = .68 x .26). 



.2. The labels over the transitions 



the random variable e is also exponentially distributed. We 
assume e is exponentially distributed with rate v. 

At region r, assuming that the adversary distribution of 
an adjacent region r' G A r is p r /, we define B r C A r as 
the set of adjacent regions r' of r such that p r >(M r >) ^ 1 
(i.e. the set of adjacent regions from which an adversary 
can leave) and C r C A r as the set of adjacent regions r' of 
r such that p r >(N r >) ^ 1 (z.e. the set of adjacent regions 
to which an adversary can enter). We denote E r as the 
expected value for the distribution p r . 

Since the vehicle can not detect the exact number of 
adversaries in adjacent regions, only an estimated value 
v e of v can be obtained from the expected number of 
adversaries in adjacent regions. Assume the current state 
as ((/, r), n, o, alive, p ri , . . . ,p rm ). If an adversary can leave 
current region r (i.e. n > M r and C r ^ 0) then the time it 
takes for an adversary to leave region r is exponentially dis- 
tributed with rate /i/(r)n because there are n adversaries 
in the region and any of them can leave region r. Similarly, 
if an adversary can enter the current region r (i.e. n < N r ), 
and there exists an adversary that can leave an adjacent 
region (i.e. B r ^ 0), then the time it takes for an adversary 
to enter region r is exponentially distributed with the 
estimated rate fi e (r) ^ r , eB E r >, where ^ r , eB E r > gives 
the total expected number of adversaries that can enter 
region r. The time it takes for the vehicle to reach facet f 
is exponentially distributed with rate A (5). Therefore, the 
estimated rate v e can be obtained as: 

i/ c = A((J)+/x z (r)nI z (i4 r ,n) + /x c (r) J2 E rM™) (9) 

r'eB r 

where n is the number of adversaries in the current region; 
li(A ri n) = when n = M r or C r — 0, and \\(A r ,n) = 1 
otherwise; and l e (n) = if n = A^, and I e (n) = 1 
otherwise. Indicator functions I/(A r ,n) and I e (n) are used 
to determine if it is possible for an adversary to leave and 
enter the current region, respectively. 



The rate v e will be used to generate the probability 
transition function P. We define the probability transition 
function P : S x Act x S — >• [0, 1] as follows: 



.,PrJ,s' = ((f,r'),n', 
with {n,...,r m } e A r and 

M,...,V fc } G A^, 5 = (/,/') G A and r' G A r , 
then: P(s,S,s') = 



• If s = ((/, r), n, o, alive, p r 



MS) 

X(S) 



p r/ {n')p° r/ {o'){l-p l 8 ost {n,o)), if b' = alive 



if 6 ; = lost. 



(10) 



Under the action (/,/'), the transition from state 
5 to s' indicates that either the vehicle reaches facet 
f (s 1 is an "alive" state) or the vehicle is lost while 
traversing the region r (s 1 is a "lost" state). 

Let us first consider the former case. ^® corre- 
sponds to the probability that the vehicle reaches 
facet f before any adversary entering or leaving 
region r. p r >(n f ) corresponds to the probability of 
observing n r adversaries in region r r when entering 
region r' from facet f. p°,(o f ) corresponds to the 
probability of observing obstacle density d for region 
r' when entering r' . (1 — p^ os£ (n, o)) corresponds to 
the probability of safely crossing the current region 
with n adversaries and obstacle density o. Since each 
of these events are independent with each other, the 
probability of transition is the multiplication of the 
above probabilities. The same reasoning applies to 
the latter case, where (1 — p l 5 ost (n,o)) is replaced by 
p l £ st (n, o) as the probability of losing the vehicle while 
crossing region r. 

If s = ((/,r),n,o, alive, p ri ,...,p rr J, s' = ((/,r),n + 
1, o, alive, p' ri , . . . ,pj. m ), with {n, . . . ,r m } G A r , 5 — 
(/,/') G A for some p r . = p' for all i — 
{!,..., m} \ {j} and p' r . = p~. for some j, then: 



(11) 



The transition from state s to s' indicates that an 
adversary from region rj enters the current region. 
Thus, the adversary distribution of region rj is up- 
dated to p' r = p~ (while the distributions for the 

other regions remain the same). ^ e ^ Ej corresponds 
to the probability that an adversary enters region 
r from rj before the vehicle reaches facet f or an 
adversary moves in between the current region and 
another adjacent region. 

• If s = ((/,r),n,o, alive, p ri ,...,p r J, s' = ((/,r),n- 
l^alive,^,...,^), with {ri,...,r m } G A r , S — 
(/,/') G A for some p r . = p' for all i = 
{1, . . . , m} \ {j} and pj,. = p+ for some j, then: 

^') = ^, (12) 

where |C r | is the cardinality of C r . The transition 
from the state s to s' indicates that an adversary 
leaves the current region and enters region rj. Thus, 
the adversary distribution of region rj is updated to 

Prj = Ptj - (rjc^j corresponds to the probability that 
an adversary enters rj from region r before the vehicle 
reaches facet f or an adversary enters the current 
region. 

• If s = ((/,r),n,o, lost, p ri , . . . , p rr J , then P(s, r, 5) = 
1. s corresponds to the case where the vehicle is lost, 
thus it self-loops with probability 1. 

• Otherwise, P(s, (5, s') = 0. 

To help understand the computation of P, a fragment of 
the MDP model corresponding to the mission scenario in 
Fig. 2(b) is shown in Fig. 4. The following proposition 
ensures that P is a valid probability transition function: 

Proposition 2. P is a valid probability transition function, 
ie - T, s >es p 0> S > s') = ltf6 e A(s) and P(s, S, s') = if 
S i A(s). 

Proof. From the definitions of P and Ai it follows that 
P(s,5,s') = if 5 £ A(s). We want to show that 
^2 s 'es ^' s ') = ^ ^ or a ^ combinations of P r , C r , M r , 
iV r , and n when 5 G A(s). Let us denote St C S as the set 
of states that are defined as s f in Eq. (10), S e C 5 as the 
set of states that are defined as s' in Eq. (11) and Si C S 
as the set of states that are defined as s' in Eq. (12). 

If P r ^ and n < N r with C r = or n = M r , then, by 
Eq. (9), ^ e = A((J)+Mc(r) E^ r ^'. Using Eq. (10)-(12) 
it follows that: 

s'eSt 

He{r)E r 



that: 



J2 P (s,S,s f )= ^Pr>(n')p° r ,{o') P l 8 ost {n,o)+ 

p r ,(n')p°,(o')(l-p s ost (n,o)) + 



s'es 



= ^>{n )p r ,(o ) + ^ 

s'eS t r'£B r 

K5) + He(r)J2 r , eB E r , 



Similarly, if C r ^ and n > M r with B r = or n = N r , 
then v e = A(<5) + m{r)n, and using Eq. (10)-(12) it follows 



Vp(*. «,*')= V ^ Pr ,(»'K,(»')pi ost (»,»)+ 



s'GS t 



V ^p r ,(n') P ;,(o / )(l-pi Mt (n, ))+ 



-<'es t 



Hl (r)n 
■\C r \ 



If P r = and C r = or if n = M r = N r , then v e = \(5). 
Using Eq. (10)-(12) it follows that: 



s'eSt 



+ V ^(n') P ; / (o')(l-^(n, )) 



= A(5) =L 

In the most general case, when B r ^ 0, C r ^ 0, and 
M r < n < 7V r , then, by Eq. (9), v e = A (5) + /iK r ) n + 
fjL e (r) Y,r'eB r E r>- Using Eq. (10)-(12) it follows that: 

Vp(s,ff,0= V ^^(n / )^(o / )^° st (n,o) + 
\{S) 



V ^i Pr ,(n'K,(o')(l-P^ t (n,o))+ 

Ve\C r \ 2-^ Ve 

A(ff) + Mi(r)n + M c(r) Er'eB. ^ 



= 1 



Thus the proof is completed. 



5. GENERATING THE OPTIMAL CONTROL 
POLICY AND A VEHICLE CONTROL STRATEGY 

After obtaining the MDP model, we solve our proposed 
problem by using the PCTL control synthesis approach 
presented in [Lahijanian et al., 2010] by translating the 
problem to a PCTL formula. The Mission Objective is 
equivalent to the temporal logic statement "eventually 
reach r p and then while always staying alive", which 
can be translated to the following formula <p\ 

Vmax=? [alive hi (alive Ar p A P>o [alive hi (alive A r^)])]. (13) 

Because formula <p has two temporal operators U (un- 
til), two maximum reachability probability problems (see 
Baier et al. [2008]) over the MDP need to be solved. It 
should be noted that the nested ^-operator in formula (j) 
(i.e. V>q\^]) finds the control policy that maximizes the 
probability of satisfying ^ and returns all the initial states 
from which ^ is satisfied with probability greater than zero 
under this policy. 

The PCTL control synthesis tool takes an MDP and 
a PCTL formula <p and returns the control policy that 
maximizes the probability of satisfying (j) as well as the 
corresponding probability value by solving two linear pro- 
gramming problems. The tool is based on the off- the- 



shelf PCTL mo del- checking tool PRISM (see Kwiatkowska 
et al. [2004]). 

We use Matlab to construct the MDP M., which takes as 
input the partitioned environment defined by R and F, the 
motion capability A of a vehicle and the values for p™ %t , 
Hi{r) and /i e ( r ) f° r all r £ R; and p l $ st , A (5) for all 
S G A. Then the MDP M. together with <p are passed to 
the PCTL control synthesis tool. The output of the control 
synthesis tool is the optimal control policy that maximizes 
the probability of satisfying <j). This policy can be directly 
translated to the desired vehicle control strategy. 

The computational complexity of our approach is as fol- 
lows: Given R, A r , N r , M r , N° and D r , the size of the 
MDP M is bounded above by max rGjR (|I?| x (N r - M r + 
l)xN°x\D r \x2), where \B\ is bounded above by |i?|x|A r | 
and \D r \ is bounded above by ((2(N r - M r ) + l)\ Ar \). The 
time complexity of the control synthesis tool is polynomial 
in the size of the MDP and linear in the number of the 
temporal operators. 

6. SIMULATOR OF THE ENVIRONMENT 

We constructed a realistic test environment in order to 
obtain the probability p l s ost (Eq. 4) from existing data of 
the distribution of obstacles in each region, and values for 
rate of the vehicle, X(S),S G A. This test environment 
consists of several components, which are shown in Fig. 5. 



We computed p l s ost (o) using sampling (Fig. 5). The random 
parameters that we considered were the size and position 
of objects in a region. Specifically, given the obstacle 
density o, we generated a random map by instantiating 
obstacles with random positions and sizes so that the 
density was o. The map was provided to the planner that 
generated a path. We used a symbolic control approach 
to plan the motion of the vehicle in the environment. 
Specifically, to implement the planner at the top of Fig. 5, 
we used the vehicle motion primitives defined in [Frazzoli 
et al., 2005]. The successes and failures for each path were 
recorded. When a feasible path was found, a standard 
model of the dynamics of a helicopter [Bullo and Lewis, 
2004] was used to simulate a trajectory following the path 
and compute X(S). 

We computed the joint probability p^ os ^(n,o) as a combi- 
nation of the marginal probabilities p l 5 ost (n) and p l 5 ost (o). 
The main reason for this approach was that while an 
accurate model is available to compute the probability 
of failing to traverse a region due to obstacles, the effect 
of adversaries is difficult to model and it is part of our 
future work. For the purposes of the case study in Sec. 
7, we assumed the probability of losing the vehicle due to 
adversaries to be p l § ost (n) = 0.01 (n) 2 for n G [0, 10]. After 
the marginal probabilities were obtained, we constructed 
the joint probability p^(n, 6) using the following formula 
(see [Nelsen, 2006]): 

-y/-log{p\° a \n))-log{p l ° at {6)) 
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Fig. 5. Test environment used to compute the probability 
p l 5 ost (n,o) and the rate X(S). 

In order to obtain p l § ost , we first generated the marginal 
probability p l s ost (o), S = (/, /') as the probability of losing 
the vehicle while traversing region r from facet / to /' 
with obstacle density o. This probability depends on the 
motion planning algorithm for the vehicle traversing the 
region, and the ability of the vehicle to detect obstacles. 
We assumed that the obstacle data in the environment was 
accurate and that there was no need for real-time obsta- 
cle detection. We used a probabilistic road-map planner 
[LaValle, 2006, Frewen et al., 2011] to solve the following 
problem: given a starting point on a facet / and an ending 
point on the facet find a shortest collision free path 
between them. The planner uses a randomized algorithm 
that consists of building a random graph over the free 
space in the environment, and finding the shortest feasible 
collision-free path. Because of the randomized nature of 
the algorithm, there is a non-zero probability that a path 
can not be found by the planner even if one exists. This is 
the probability p l s ost (o) because it is the probability that 
the vehicle can not safely traverse from facet / to f. 



7. RESULTS 

We considered the scenario shown on Fig. 2(a) together 
with the partitioned environment and the possible motion 
of the vehicle A shown on Fig. 2(b). The initial probability 
mass function for adversaries in region r G R, p % 7 ntt ^ and 
the probability mass function of the obstacle density in 
region r G R, p° r , are given in Table 1. In addition, we 
assumed that there is no adversary or obstacle in region 
r p and r^. The probability p l s ost (n,o) and the rates of the 
vehicle X(S) for all S G A were obtained from the simulator. 
We used the following numerical values: A((/, /')) = 0.128 
when / and f are facets of n and r 5 , A((/, /')) = 0.125 
when / and /' are facets of r2, r§, rg, 7*10, and rn, and 
A((/, /')) = 0-091 when / and /' are facets of tq, and 
rj with fJL e (r) = /ij(r) = 0.05 for all r G R. 

Table 1. Obstacle density and adversary distri- 
bution 



Region 


Obstacle 
density 


Adversary distribution 


case A 


case B 


n 


1% 


p*?"(0) = 1 


p«™*(0) = 1 




3% 


pl n ^ t (x) = 1/3, x G [7,9] 


p^*(x) = l/3,xG [2,4] 


rz 


6% 


pl™\x) = 1/3, x G [7,9] 


p^*(x) = l/3,xG [2,4] 


r 4 


5% 


pl!^ t (x) = 1/3, x G [1,3] 


p^*(x) = l/3,xG [2,4] 


r5 


1% 


p™ t (x) = l/3,xe [7,9] 


p^*(x) = l/3,xG [2,4] 


TQ 


9% 


p^ it {x) = 1/3, x G [7,9] 


p^*(x) = l/3,xG [2,4] 


r 7 


9% 


pl^ ix) = 1/3, x G [1,3] 


p^*(x) = l/3,xG [2,4] 


rs 


3% 


pj.™* (a) = 1/3, x G [1,3] 


p^*(x) = l/3,xG [2,4] 


rg 


4% 


p™ u (x) = 1/3, x G [1,3] 


p^*(x) = l/3,xG [4,6] 


rio 


4% 


p^*(x) = 1/3, x G [1,3] 


p*?J*(x) = l/3,xE [4,6] 


rn 


3% 


p™* {x) = 1/3, x G [7,9] 


p^f(x) = l/3,xG [2,4] 



We obtained the vehicle control strategy through the 
method described in Sec. 5. Two vehicle runs are shown 
in Fig. 6, corresponding to case A and case B (Table 1). 




Fig. 6. Runs of the vehicle in the partitioned environment 
for the given mission scenario and the data. Two 
different adversary distributions are given in Table 
1. The arrows represent movement of the vehicle in 
between facets. Red arrows correspond to case A, and 
blue arrows correspond to case B. 

We found that the maximum probability of satisfying the 
specification <\> (Eq. 13) for cases A and B to be 0.141 
and 0.805, respectively. The substantial difference between 
these two maximum probabilities is due to the difference 
in adversary distributions. A close analysis of the vehicle 
runs together with the adversary distributions shows that 
in case A the number of adversaries in regions r2, r% and r§ 
is high, which results in the vehicle control strategy that 
ensures that the vehicle avoids this regions. 

For this particular case study, the MDP M. had 1079 
states. The Matlab code used to construct M ran for 
approximately 14 minutes on a MacBook Pro computer 
with a 2.5 GHz dual core processor. Furthermore, the time 
it took the control synthesis tool to generate optimal policy 
is 4 minutes. 

8. CONCLUSIONS AND FINAL REMARKS 

In this paper we provided an approach to obtain a reactive 
control strategy that provides probabilistic guarantees for 
achieving a mission objective in a threat-rich environment. 
We modeled the motion of the vehicle, as well as vehicle 
estimates of the adversary distributions as an MDP. We 
then found the optimal control strategy for the vehicle 
maximizing the probability of satisfying a given mission 
task specified as a PCTL formula. 

Future work include extensions of this approach to a 
richer specification language such as probabilistic Linear 
Temporal Logic (PLTL) and a more general model of the 
vehicle in the environment such as a Partially Observed 
Markov Decision Process (POMDP). 
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